home *** CD-ROM | disk | FTP | other *** search
- .TL
- The MMDF Database
- .br
- (somewhat obsolete)
- .AU
- Steve Kille
- .AI
- Department of Computer Science
- University College London
- .NH
- Overview
- .PP
- This design specifies modifications to the database access procedures
- of the MMDF mail system.
- The changes are motivated by the expansion of the names database for the
- SRI environment.
- The major changes involve the reorganization of the database for more
- efficient random access methods that would reduce the processing
- inefficiencies of the sequential access of large databases.
- The changes take advantage of the database management dbm(3)[*]
- .FS
- [*] The notation dbm(3) refers to the documentation as found in the Unix
- Programmer's Manual, Vol. 1, e.g., the database documentation is in
- Section 3 under the title
- .B dbm.
- .FE
- libraries
- of V7 Unix.
-
- .NH 1
- Scope of Changes
- .PP
- All changes are restricted to two modules of MMDF.
- One module is part of the
- .B submit
- process
- and the other module is shared by the
- .B submit,
- .B deliver,
- and
- .B channel
- processes.
- There are potential efficiencies that could be gained in other modules due
- to the use of the new design but these changes are left for a future effort.
- The include file
- .B ch.h
- is compacted by eliminating obsolete fields in the channel
- structures.
-
- .NH 2
- Affected Modules
- .PP
- The two modules that are affected are
- .B ch_table.c
- and
- .B adr_submit.c.
- .B ch_table
- is completely rewritten to use the dbm(3) library.
- Two procedures are respecified because their functions have been altered.
- The remaining procedures are reimplemented.
- The procedure
- .B loc_alsrch
- in the module
- .B adr_submit.c
- is modified to remove references to and logic for the obsolete procedure
- .B ch_tseek
- and to add logic to use the procedure
- .B ch_tsetnum.
- .NH 2
- Interfaces
- .PP
- The interfaces to the public procedures of ch_table.c are preserved.
- The re-implemented procedures have the same names, accept the same
- arguments, and
- return the same results as the procedures they replace.
- The respecified procedures have different names and arguments to
- ensure that incompatible calls to the old procedures would be caught and
- prevented from improperly linking to the new procedures.
- Some of the re-implemented procedures have a boolean argument that indicates
- whether the search
- should start at the beginning of the file or where the pointer is currently
- located.
- The argument now indicates whether the partial ordering information stored
- in the database should be used to detect possible name circuits.
-
- .NH 2
- Affected Procedures
- .PP
- Only one procedure,
- .B loc_alsrch,
- in the module
- .B adr_submit.c
- is modified.
- The modification is described above.
- All other procedures in this section are in the module
- .B ch_table.c.
- .SH
- ch_tblopen
- .IP
- This procedure is removed completely and
- its replacement,
- .B ch_tdbminit,
- is made completely internal to the module.
- It calls the
- .B dbminit
- procedure for the first database access
- to initialize the database.
-
- .SH
- ch_tseek
- .IP
- This procedure is removed completely.
- It is replaced by the procedure
- .B ch_tsetseq
- which sets the minimum acceptable point for an entry to be in the partial
- ordering of the database.
- .SH
- ch_ttell
- .IP
- This is really a macro defined in the file
- .B ch.h.
- It is removed and the procedure
- .B ch_tgetseq
- is defined in its place to return the partial ordering of the most recently
- accessed entry in the database.
- .SH
- ch_h2adr
- .IP
- This procedure is rewritten internally to use the dbm(3) library.
- The argument
- .B first
- is redefined to be a flag to indicate whether partial ordering of the entries
- is to be enforced.
- All other aspects of the procedure remain the same.
- .SH
- ch_ad2first
- .IP
- This procedure is rewritten internally to use the dbm(3) library.
- The external interface for this procedure remains the same.
- .SH
- ch_h2first
- .IP
- This procedure is rewritten internally to use the dbm(3) library.
- The argument
- .B frst
- is redefined to be a flag to indicate whether partial ordering of the entries
- is to be enforced.
- All other aspects of the procedure remain the same.
- .SH
- ch_h2chan
- .IP
- This procedure is rewritten internally to use the dbm(3) library.
- The external interface for this procedure remains the same.
- .PP
- The procedures no longer access each other internally.
- They instead set up their own calls to the dbm(3) procedure
- .B fetch
- and process the returned data themselves.
- The linkages between data records is defined in the logic of the access
- procedures.
-
- .NH 1
- Addressing Database Design
- .PP
- The support provided by the dbm(3) library consists of procedures to
- store and retrieve data items using arbitrary ascii strings as keys.
- Since no provision is made for accessing the internal representation of
- keys or data pointers, all linkages are maintained as ascii keys.
- The interpretation of a data item as a key for further searches of the
- database is therefore embedded in the MMDF access procedures.
- The subsections below define these relationships between data items and
- other keys.
-
- .NH 2
- Database Internals
- .PP
- The sequential database is composed of lists of key and value pairs.
- Each list is a separate text file that is appropriate only in certain
- contexts.
- An address or name is either a mailbox or a host depending on whether it
- is in the file used for aliases or the file used for channel host addresses.
- The formats however are identical.
- .PP
- A data entry is composed of three components; the keyword that is used for
- access, a partial ordering number, and a list of one or more values
- separated by FS ('\\034') characters.
- See Figure 2 and Section 5.2.1 for the details and format of the entry.
-
- .PP
- A value is composed of a channel code character followed by a printable
- ascii string.
- There are three basic types of values used in the data entries.
- They provide single, multiple, and indirect values for the keys.
- All of the types are transparent to the database procedures and are returned
- unmodified to the calling procedures for processing.
-
- .IP Single 10
- This is the simplest form of database value.
- It is a single ascii string that is passed back to the address processor.
- The value can be the real name associated with an alias, a host name
- associated with a channel, or a host address associated with a host name.
-
- .IP Multiple 10
- A multiple value is a comma ',' separated list of single values.
- All of the single values are associated simultaneously with the key.
- Such a value is used for small distribution lists or for aliasing a name
- to mailboxes for a user on different machines.
-
- .IP Indirect 10
- The indirect value is an extension of the multiple value.
- A value that starts with a left angle bracket '<' character is treated
- as a filename (not including the '<') where a list of values may be found.
- Distribution lists are implemented with indirect values.
- The referred to file is not searched but is read sequentially for
- values that are then recursively processed against the database.
-
- .PP
- Only one file is used in the dbm(3) library[*].
- .FS
- [*] There are really two files, one data file and one key hash file,
- but they are considered as one in this context since they are intimately
- connected in the dbm(3) library.
- .FE
- Identical keys that are in different lists in the sequential files are
- stored under one key in the random access database.
- The storing of identical keys is described in the next section.
-
- .NH 3
- Partial Ordering Numbers
- .PP
- The ordering of entries in the sequential database is used by the sequential
- database to detect name circuits.
- The first two bytes of an entry datum are an unsigned 16 bit ordering number
- that indicates the partial ordering between any two entries in the database.
- The numbers are assigned by the database management program.
- Section 3.2 Name Circuit Prevention describes its use.
-
- .NH 3
- Channel Encoding
- .PP
- The context information implied in the multiple files used for the sequential
- database must be preserved in the random access database.
- This context is preserved by prepending a code character to the value.
- This code is either the official name code or the code character
- assigned to a channel.
- The code is used both to check whether the value is valid
- for a particular channel and to map a name(host) to a channel.
- .PP
- Identical keys may be stored in more than one file in the sequential database.
- This feature is implemented in the random access database by concatenating
- the entries from the multiple files and separating them by the FS ('\\034')
- character.
- The first character of each entry in such a list is the code character for
- that entry.
- Note that the first character is a code only for these multiple entries and
- not for the multiple comma ',' separated values described above.
- The code character is stripped by the procedures and not returned.
- The sequential order of the entries implies the partial ordering imposed
- in the sequential database.
-
- .NH 3
- Official Names
- .PP
- An official name is the primary identifier for a host.
- Other names are allowed but the official name is the only name that is
- universally known by all other hosts.
- Alternate or alias names are available but MMDF converts these names to
- the official name before the address is used in the system.
- .PP
- By convention, the key associated with the first occurrence of a value
- (address) in the sequential file is the official name.
- It may be found from an address by searching for the first occurrence of the
- address in the value field and returning the key as the official name.
- The official name is found from an alternate by first finding the alternate
- and then by taking its value (address) and searching for the its first
- occurrence.
-
- .PP
- There are two entries stored in the database for an official name.
- One is a normal name-key with address-entry and the other is
- an address-key and name-entry.
- The name-entry has a code character of '='.
- An official name is recognized when the database is being built and the
- inverse of the entry is generated by exchanging key and entry and storing
- the result.
- The official name is also stored as the first entry of all other (alias)
- entries that have the same address as the official name.
- The official name is retrieved from a value (address) by retrieving
- the entry under the address key.
- It is retrieved from an alternate name by retrieving the entry with the
- alternate name keyword and then selecting the official name entry.
-
- .NH 3
- Name List Redirection
- .PP
- Name list redirection is not part of the sequential search database
- procedures.
- The procedure
- .B alst_proc
- parses the value returned from reading the current address.
- The address line may either come from an input line or
- a database value due to the recursive
- logic of the address parsing procedures.
- The redirection is done on all addresses that start with the '<' character.
- This character is transparent to the database system and is considered part
- of the returned value.
- The partial ordering numbers are used by the database procedures to detect and
- reject name circuits generated when keywords in the indirect files do not
- map to out of order entries in the database.
- .NH 3
- Reserved Code Characters
- .PP
- The only reserved code character is '='.
- This character may not be used for channel codes because it is used to
- identify an official name.
- It should be considered to be the official name channel code character.
- The FS ('\\034') character is also reserved for use as the entry separator.
- All other characters are acceptable.
-
- .NH 2
- Name Circuit Prevention
- .PP
- A name circuit exists when a name key is self referencing though an
- arbitrary number of name translations.
- Such a name would put the address checking procedures in an infinite loop.
- The sequential database defeats some cases of self reference by forcing
- the search of any one list at any one level to be one pass.
- This algorithm forces a partial ordering of all addresses;
- but it is only enforced for one file at a time.
- .PP
- The random access database contains an ordering number for each entry in the
- database that identifies the order in which the entries were stored in the
- database.
- This partial ordering of the
- entries prevents name circuits by forcing all translations to be
- forward references that would eventually terminate with the end of the list.
- The check for partial ordering is under the control of the argument to the
- the applicable procedures that controlled whether the sequential search
- started from the beginning of the file.
- The partial ordering is not enforced if the argument is TRUE to be compatible
- with the old procedures.
-
- .NH 1
- Database Access Procedures
- .PP
- This section describes the algorithms used to implement the database with
- the dbm(3) package.
- These procedures replace those procedures of the same name in the file
- .B ch_table.c
-
- .NH 2
- Access Initialization
- .PP
- The procedure
- .B ch_tblopen
- is removed and it is replaced by the procedure
- .B ch_tdbminit
- which is callable only by the procedures
- within the module.
- There is no argument list.
- An internal static flag is maintained by the procedure to
- record if the database file is initialized.
- During the course of this procedure it will call the dbm(3) procedure
- .B dbminit
- and set the flag so further calls to the procedure will not attempt to
- initialize the database again.
-
- .NH 2
- Partial Ordering Control
- .PP
- The old procedure
- .B ch_tseek
- and the macro
- .B ch_ttell
- are deleted from the file along with all references to them.
- They are replaced by the procedures
- .B ch_tsetseq
- and
- .B ch_tgetseq
- which control the partial ordering in the database.
- .B ch_tgetseq
- returns the sequence number of the most recently accessed entry and
- .B ch_tsetseq
- changes the internal sequence number value to its argument.
- A sequence number is a value of type unsigned short.
-
- .NH 2
- Database Record Access
- .PP
- All records in the database are accessed by the internal procedure
- .B ch_fetch.
- It correctly forms its string argument into a key for the dbm(3) procedure
- .B fetch
- and decomposes the returned datum into the partial ordering number and
- a vector of char pointers to the individual values.
- The number and the vector are stored in the data structure passed to the
- procedure.
- Success return is indicated by returning the pointer to the data structure
- and failure is indicated by a NULL pointer.
-
- .NH 2
- Key to Value Access
- .PP
- This function is done by the procedure
- .B ch_h2adr.
- The 'name' parameter is passed to the internal procedure
- .B ch_fetch.
- The returned structure may have multiple entries because the same key may
- be used for more than one channel or alias list.
- The code character of each entry is checked against the code character
- stored in the channel structure and
- the entry that matches is returned in the pointer parameter 'buf' with its
- code character stripped off.
- TRUE is returned if a match was found and FALSE is returned if
- .B ch_fetch
- can not find the key or if the code character for the channel does not
- match an entry in the datum.
-
- .NH 2
- Value to Primary Key Access
- .PP
- This function is performed by the procedure
- .B ch_ad2first.
- The parameter 'adrstr' is passed to the procedure
- .B ch_fetch
- as described above and the returned structure is scanned for an entry whose
- code character matches the official name code character '='.
- The result is returned via the pointer parameter 'buf'.
- TRUE is returned if a match was found and FALSE is returned if
- .B ch_fetch
- can not find the key or if there is no official name.
-
- .NH 2
- Key to Primary Key Access
- .PP
- This function is performed by the procedure
- .B ch_h2first.
- The parameter 'str' is passed to the procedure
- .B ch_fetch
- and the returned structure is searched for an official name entry.
- The official name is returned via
- the pointer parameter 'buf'.
- TRUE is returned if a match was found and FALSE is returned if
- .B ch_fetch
- can not find the key or if there is no official name entry.
- The parameter 'str' is returned if no official name is found.
-
- .NH 2
- Key to Any Channel Access
- .PP
- This function is performed by the procedure
- .B ch_h2chan.
- The parameter 'hostr' is passed to the procedure
- .B ch_fetch.
- The pointer parameter 'adrstr' is used to return either the official name
- if it is found or the parameter 'adrstr' if none is found.
- The procedure returns NOTOK if there is no official name.
- Each value except the official name in the entry is used in turn to find
- the first channel available for the key.
- The code character of the value is passed to the procedure
- .B ch_c2struct
- which will either return a channel pointer or the value NOTOK.
- If there is no channel to match any of the code characters then
- .B ch_h2chan
- also returns NOTOK.
- If the channel found by
- .B ch_c2struct
- is the local channel and local delivery is ok then the procedure returns OK.
- Otherwise, it returns the channel pointer.
-
- .NH 1
- Database Management Program
- .PP
- The program
- .B dbmbuild
- generates the database from ascii files of the same format
- as the sequential database.
- It does not attempt to do updates to an existing database; rather, it
- regenerates the entire database each time and the resultant files are
- moved into place with other commands.
-
- .NH 2
- Inputs
- .PP
- The input to the program is in the current sequential file format.
- The code character is passed to the program as a command line argument.
- A command line argument selects official name processing; the default
- action is no processing.
-
- .KS
- .DS
- line ::= text eol
- text ::= comment | entry
- comment ::= '#' <any printable ascii string>
- entry ::= key sep value
- key ::= <lower case printable ascii string>
- sep ::= ':' | <space> | <tab>
- value ::= <printable ascii string with optional spaces or tabs>
- eol ::= '\e012'
- .ce
- Fig . 1, Input Line format
- .DE
- .KE
- .NH 2
- Outputs
- .PP
- The output file is generated into two files.
- The file\fI.dir\fP file contains a bitmap directory of the actual
- keys and values
- located in the file\fI.pag\fP file.
- The file is generated by the
- .B store
- procedure of the dbm(3) library.
- See the Unix Programmer's Manual, Vol 1, dbm(3x), for further details on
- the database library.
-
- .NH 3
- Record Format
- .PP
- A record consists of a key and a datum.
- The key is an arbitrary ascii text string but for matching purposes
- it is all lower case with all extraneous white space compressed out.
- The datum is an arbitrary variable length byte string.
- The length is controlled by a count field.
- The subfields of the datum are defined below.
- There is no white space separating any of the fields.
- They are either defined by position and count or by the field separator
- (FS) character.
- .KS
- .DS
- datum ::= serial-number entry-list
- serial-number ::= <two bytes used as unsigned short>
- entry-list ::= entry
- | entry-list FS entry
- entry ::= code-char value
- code-char ::= <printable ascii char matching channel char>
- value ::= <arbitrary compressed printable ascii string>
- FS ::= '\e034'
- .ce
- Fig. 2, Internal Record Format
- .DE
- .KE
-
- .NH 2
- Algorithms
- .PP
- These algorithms convert a sequential database into the random access
- database suitable for the online MMDF.
-
- .NH 3
- Partial Ordering Control
- .PP
- At the start of processing an input file the entry under the key
- "Local-Name" is retrieved and the serial number field is retained for
- further processing.
- If the entry is not found then the serial number is initialized to 1 and
- the key "Local-Name" is stored.
- The name is stored again at the end of processing with a datum containing
- the updated serial number.
- Any values stored under the key are preserved so that the key may be
- used to store other information in addition to the serial number.
- In future, this entry can be used to store the host's local name instead
- of the separate file currently used.
-
-
- .NH 3
- Channel Encoding
- .PP
- Channel code characters are passed to the program with the sequential
- data file.
- The code characters are retained by the program and prepended to values
- as they are stored in the database.
- There is currently one code character assigned for each channel in the
- system as well as the '=' character which identifies an official name.
-
- .NH 3
- Identical Keys
- .PP
- Identical keys are allowed in the sequential database.
- They are implemented in the random database by concatenating the entries
- according to the record format in Fig 2.
- The key part of an input line is retrieved before it is stored into the
- database.
- The retrieved datum is used instead of an initialized empty datum
- whenever the retrieval returns a valid datum.
- The new entry is appended to the end along with its code character and
- separated from the rest of the retrieved entry by the FS character.
-
- .NH 3
- Official Name Processing
- .PP
- An official name in the sequential database is the first name associated with
- a particular address.
- Official names are detected and marked in the random access database in a
- similar manner.
- Official names are entered into the database under the control of a command
- line argument; the default action is to not process official names.
- .PP
- As the sequential input file is read for entries to store in the database, the
- value, i.e., address, part is \fIfetch\fPed to see if it is present.
- If the
- .B fetch
- returns a valid pointer to an official name then it will be used to build
- the datum for the current input line.
- If the pointer is valid but the value is not an official name then
- the official name is appended using the format from Fig 2 and datum is
- stored back into the database.
- A NULL pointer indicates that the current entry is an official name.
- The value from the address part (from above) is used as a key and the key
- part of the current input line is used as the value for the new entry.
- The entry has a '=' code character and the value
- is 'canonicalized', i.e., the first character and the
- first character after a '-' are capitalized.
- This official name entry is always stored as the first value in the datum for
- all names, including itself, that refer to it.
- The string is stored in every datum already canonicalized in order to
- save processing time for MMDF.
- .NH 3
- Name Circuit Detection
- .PP
- A name circuit is exists if there is a valid key to the database that is
- equal to the value in the current input line that is not
- the inverse key of an official name.
- This can occur only if there was a previous entry in the database that
- had a key equal to the value of the current entry.
- The input line is stored as an additional value in the entry.
- (See Section 5.3.3).
- There is a change in the way name circuits are rejected by the database
- because the partial ordering for the entry remains the same.
- Any identical keys which would have been valid in the sequential database
- because they were positionally correct are no longer valid because they
- are, in effect, moved up to be just below (actually part of)
- the first, out of order key.
- This restriction prohibits certain obscure or clever alias list
- manipulations.
- .PP
- Forward references, i.e., the value field of a previous entry is equal to
- the key of the current entry, are allowed in the sequential database since
- the linkages will terminate at the end of file.
- These are allowed in the random access database too because the database
- procedures will check sequence numbers when requested by the calling
- procedure and reject all fetches that are not
- forward references thereby preventing the name circuit.
- .NH 2
- Error Processing
- .PP
- All system and database inconsistency errors are reported to the
- operator on the standard error output.
- All improperly formed entries
- are rejected.
-
-